The Role Of Hubness in High-dimensional Data Analysis

نویسنده

Nenad Tomasev

چکیده

Machine learning in intrinsically high-dimensional data is known to be challenging and this is usually referred to as the curse of dimensionality. Designing machine learning methods that perform well in many dimensions is critical, since highdimensional data arises often in practical applications and typical examples include textual, image and multimedia feature representations, as well as time series and biomedical data. The hubness phenomenon [1] has recently come into focus as an important aspect of the curse of dimensionality that affects many instance-based machine learning systems. With increasing dimensionality, the distribution of instance relevance within the models tends to become longtailed. A small number of hub points dominates the analysis and influences a disproportionate number of system predictions. Most remaining points are rarely or never retrieved in relevance queries, resulting in an information loss. High data hubness has been linked to poor system performance in many data domains. The dissertation [2] proposes several novel hubness-aware machine learning algorithms to improve the effectiveness of machine learning in intrinsically high-dimensional data. The proposed

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Study on Clustering High Dimensional Data Using Hubness Phenomenon

Data mining is the non-trivial process of extracting information from the very large database. In recent years, data repository has a high dimensional data, which makes a complete search in most of the data mining problems leads computationally infeasible. To eradicate this problem clustering plays a vital role in handling low dimensional data and high dimensional data. Low dimensional data mak...

متن کامل

Clustering with Shared Nearest Neighbor-unscented Transform Based Estimation

Subspace clustering developed from the group of cluster objects in all subspaces of a dataset. When clustering high dimensional objects, the accuracy and efficiency of traditional clustering algorithms are very poor, because data objects may belong to diverse clusters in different subspaces comprised of different combinations of dimensions. To overcome the above issue, we are going to implement...

متن کامل

Hub Co-occurrence Modeling for Robust High-Dimensional kNN Classification

The emergence of hubs in k-nearest neighbor (kNN) topologies of intrinsically high dimensional data has recently been shown to be quite detrimental to many standard machine learning tasks, including classification. Robust hubness-aware learning methods are required in order to overcome the impact of the highly uneven distribution of influence. In this paper, we have adapted the Hidden Naive Bay...

متن کامل

An Improved Unsupervised Cluster based Hubness Technique for Outlier Detection in High dimensional data

Outlier detection in high dimensional data becomes an emerging technique in today’s research in the area of data mining. It tries to find entities that are considerably unrelated, unique and inconsistent with respect to the common data in an input database. It faces various challenges because of the increase of dimensionality. Hubness has recently been developed as an important concept and acts...

متن کامل

Class imbalance and the curse of minority hubs

Most machine learning tasks involve learning from high-dimensional data, which is often quite difficult to handle. Hubness is an aspect of the curse of dimensionality that was shown to be highly detrimental to k-nearest neighbor methods in high-dimensional feature spaces. Hubs, very frequent nearest neighbors, emerge as centers of influence within the data and often act as semantic singularitie...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Informatica (Slovenia)

دوره 38 شماره

صفحات -

تاریخ انتشار 2014

The Role Of Hubness in High-dimensional Data Analysis

نویسنده

چکیده

منابع مشابه

A Study on Clustering High Dimensional Data Using Hubness Phenomenon

Clustering with Shared Nearest Neighbor-unscented Transform Based Estimation

Hub Co-occurrence Modeling for Robust High-Dimensional kNN Classification

An Improved Unsupervised Cluster based Hubness Technique for Outlier Detection in High dimensional data

Class imbalance and the curse of minority hubs

عنوان ژورنال:

اشتراک گذاری